Rare and Weak Effects in Large-scale Inference: Methods and Phase Diagrams
نویسندگان
چکیده
Often when we deal with ‘Big Data’, the true effects we are interested in are Rare and Weak (RW). Researchers measure a large number of features, hoping to find perhaps only a small fraction of them to be relevant to the research in question; the effect sizes of the relevant features are individually small so the true effects are not strong enough to stand out for themselves. Higher Criticism (HC) and Graphlet Screening (GS) are two classes of methods that are specifically designed for the Rare/Weak settings. HC was introduced to determine whether there are any relevant effects in all the measured features. More recently, HC was applied to classification, where it provides a method for selecting useful predictive features for trained classification rules. GS was introduced as a graph-guided multivariate screening procedure, and was used for variable selection. We develop a theoretical framework where we use an Asymptotic Rare and Weak (ARW) model simultaneously controlling the size and prevalence of useful/significant features among the useless/null bulk. At the heart of the ARW model is the so-called phase diagram, which is a way to visualize clearly the class of ARW settings where the relevant effects are so rare or weak that desired goals (signal detection, variable selection, etc.) are simply impossible to achieve. We show that HC and GS have important advantages over better known procedures and achieve the optimal phase diagrams in a variety of ARW settings. HC and GS are flexible ideas that adapt easily to many interesting situations. We review the basics of these ideas and some of the recent extensions, discuss their connections to existing literature, and suggest some new applications of these ideas.
منابع مشابه
Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects
In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful pre...
متن کاملDetection boundary and Higher Criticism approach for rare and weak genetic effects
Genome-wide association studies (GWAS) have identified many genetic factors underlying complex human traits. However, these factors have explained only a small fraction of these traits’ genetic heritability. It is argued that many more genetic factors remain undiscovered. These genetic factors likely are weakly associated at the population level and sparsely distributed across the genome. In th...
متن کاملGeochemistry and petrology of gabbrodiorites from Palang Dar Area (Northeast Damghan)
Palang Dar gabbrodioritic intrusion cropped out in about 30 Km NE of Damghan in the eastern part of the Alborz structural unit. In this area a few small-scale gabbrodioritic intrusions and diabasic dikes with Middle Jurassic age intruded into shale, sandstones and limestones of Shemshak Formation. Emplacement of this intrusion into host rocks are associated with contact metamorphism and formati...
متن کاملCausal Inference with Rare Events in Large-Scale Time-Series Data
Large-scale observational datasets are prevalent in many areas of research, including biomedical informatics, computational social science, and finance. However, our ability to use these data for decision-making lags behind our ability to collect and mine them. One reason for this is the lack of methods for inferring the causal impact of rare events. In cases such as the monitoring of continuou...
متن کاملDrawing CCCT Diagrams and Investigation of Deformation Effects on Martensite and Bainite Trabsformations in NiCrMoV Steel
In this study, two CCCT diagrams are drawn to be compared with a CCT diagram. The CCCT diagrams represent continuous cooling transformations in stress assisted state. The increased Md and Bd temperatures of CCCT diagrams were also compared with those of the CCT diagrams and the cause was investigated from both thermodynamic and metallurgical viewpoints. Thermodynamic examinations revealed that ...
متن کامل